Data Report — Diabetes 130-US Hospitals for Years 1999-2008

Source: UCI dataset 296

SemMap JSON-LD: dataset.semmap.json · RDFa HTML

Overview

Metric Value
Dataset Diabetes 130-US Hospitals for Years 1999-2008
Source UCI dataset 296
Rows 101,766
Columns 48
Discrete 42
Continuous 6
SemMap SemMap JSON-LD
SemMap HTML
Missingness Not modeled

Variables and summary

variable inferred dist
race discrete Caucasian [Caucasian]: 77840 (76.49%)
African American [AfricanAmerican]: 19622 (19.28%)
Hispanic [Hispanic]: 2094 (2.06%)
Other [Other]: 1542 (1.52%)
Asian [Asian]: 668 (0.66%)
gender discrete Female [Female]: 54708 (53.76%)
Male [Male]: 47055 (46.24%)
Unknown/Invalid [Unknown/Invalid]: 3 (0.00%)
age discrete [70-80): 26068 (25.62%)
[60-70): 22483 (22.09%)
[50-60): 17256 (16.96%)
[80-90): 17197 (16.90%)
[40-50): 9685 (9.52%)
[30-40): 3775 (3.71%)
[90-100): 2793 (2.74%)
[20-30): 1657 (1.63%)
[10-20): 691 (0.68%)
[0-10): 161 (0.16%)
weight discrete [75-100): 44067 (43.30%)
[50-75): 28250 (27.76%)
[100-125): 18437 (18.12%)
[125-150): 4899 (4.81%)
[25-50): 2915 (2.86%)
[0-25): 1828 (1.80%)
[150-175): 937 (0.92%)
[175-200): 366 (0.36%)
>200: 67 (0.07%)
admission_type_id discrete 1: 53990 (53.05%)
3: 18869 (18.54%)
2: 18480 (18.16%)
6: 5291 (5.20%)
5: 4785 (4.70%)
8: 320 (0.31%)
7: 21 (0.02%)
4: 10 (0.01%)
discharge_disposition_id continuous 3.7156 ± 5.2802 [1, 1, 1, 4, 28]
admission_source_id discrete 7: 57494 (56.50%)
1: 29565 (29.05%)
17: 6781 (6.66%)
4: 3187 (3.13%)
6: 2264 (2.22%)
2: 1104 (1.08%)
5: 855 (0.84%)
3: 187 (0.18%)
20: 161 (0.16%)
9: 125 (0.12%)
… (+7 more)
time_in_hospital discrete 3: 17756 (17.45%)
2: 17224 (16.93%)
1: 14208 (13.96%)
4: 13924 (13.68%)
5: 9966 (9.79%)
6: 7539 (7.41%)
7: 5859 (5.76%)
8: 4391 (4.31%)
9: 3002 (2.95%)
10: 2342 (2.30%)
… (+4 more)
payer_code discrete MC: 65018 (63.89%)
HM: 7934 (7.80%)
SP: 6136 (6.03%)
BC: 5963 (5.86%)
MD: 4711 (4.63%)
UN: 3360 (3.30%)
CP: 3143 (3.09%)
CM: 2283 (2.24%)
OG: 1204 (1.18%)
PO: 703 (0.69%)
… (+7 more)
medical_specialty discrete InternalMedicine: 26599 (26.14%)
Emergency/Trauma: 18817 (18.49%)
Family/GeneralPractice: 13744 (13.51%)
Cardiology: 10034 (9.86%)
Surgery-General: 6145 (6.04%)
Radiologist: 3291 (3.23%)
Orthopedics: 2999 (2.95%)
Nephrology: 2867 (2.82%)
Orthopedics-Reconstructive: 1965 (1.93%)
Pulmonology: 1642 (1.61%)
… (+62 more)
num_lab_procedures continuous 43.0956 ± 19.6744 [1, 31, 44, 57, 132]
num_procedures discrete 0: 46652 (45.84%)
1: 20742 (20.38%)
2: 12717 (12.50%)
3: 9443 (9.28%)
6: 4954 (4.87%)
4: 4180 (4.11%)
5: 3078 (3.02%)
num_medications continuous 16.0218 ± 8.1276 [1, 10, 15, 20, 81]
number_outpatient continuous 0.3694 ± 1.2673 [0, 0, 0, 0, 42]
number_emergency continuous 0.1978 ± 0.9305 [0, 0, 0, 0, 76]
number_inpatient continuous 0.6356 ± 1.2629 [0, 0, 0, 1, 21]
diag_1 discrete 428: 6863 (6.74%)
414: 6584 (6.47%)
786: 4017 (3.95%)
410: 3616 (3.55%)
486: 3509 (3.45%)
427: 2768 (2.72%)
491: 2275 (2.24%)
715: 2151 (2.11%)
682: 2042 (2.01%)
434: 2029 (1.99%)
… (+706 more)
diag_2 discrete 276: 6773 (6.66%)
428: 6687 (6.57%)
250: 6096 (5.99%)
427: 5048 (4.96%)
401: 3754 (3.69%)
496: 3314 (3.26%)
599: 3294 (3.24%)
403: 2836 (2.79%)
414: 2665 (2.62%)
411: 2578 (2.53%)
… (+738 more)
diag_3 discrete 250: 11733 (11.53%)
401: 8424 (8.28%)
276: 5239 (5.15%)
428: 4641 (4.56%)
427: 4005 (3.94%)
414: 3718 (3.65%)
496: 2635 (2.59%)
403: 2395 (2.35%)
585: 2007 (1.97%)
272: 1992 (1.96%)
… (+779 more)
number_diagnoses discrete 9: 49474 (48.62%)
5: 11393 (11.20%)
8: 10616 (10.43%)
7: 10393 (10.21%)
6: 10161 (9.98%)
4: 5537 (5.44%)
3: 2835 (2.79%)
2: 1023 (1.01%)
1: 219 (0.22%)
16: 45 (0.04%)
… (+6 more)
max_glu_serum discrete Norm: 40817 (40.11%)
>300: 36052 (35.43%)
>200: 24897 (24.46%)
A1Cresult discrete >8: 49610 (48.75%)
Norm: 29207 (28.70%)
>7: 22949 (22.55%)
metformin discrete No: 81778 (80.36%)
Steady: 18346 (18.03%)
Up: 1067 (1.05%)
Down: 575 (0.57%)
repaglinide discrete No: 100227 (98.49%)
Steady: 1384 (1.36%)
Up: 110 (0.11%)
Down: 45 (0.04%)
nateglinide discrete No: 101063 (99.31%)
Steady: 668 (0.66%)
Up: 24 (0.02%)
Down: 11 (0.01%)
chlorpropamide discrete No: 101680 (99.92%)
Steady: 79 (0.08%)
Up: 6 (0.01%)
Down: 1 (0.00%)
glimepiride discrete No: 96575 (94.90%)
Steady: 4670 (4.59%)
Up: 327 (0.32%)
Down: 194 (0.19%)
acetohexamide discrete No: 101765 (100.00%)
glipizide discrete No: 89080 (87.53%)
Steady: 11356 (11.16%)
Up: 770 (0.76%)
Down: 560 (0.55%)
glyburide discrete No: 91116 (89.53%)
Steady: 9274 (9.11%)
Up: 812 (0.80%)
Down: 564 (0.55%)
tolbutamide discrete No: 101743 (99.98%)
pioglitazone discrete No: 94438 (92.80%)
Steady: 6976 (6.85%)
Up: 234 (0.23%)
Down: 118 (0.12%)
rosiglitazone discrete No: 95401 (93.75%)
Steady: 6100 (5.99%)
Up: 178 (0.17%)
Down: 87 (0.09%)
acarbose discrete No: 101458 (99.70%)
Steady: 295 (0.29%)
Up: 10 (0.01%)
Down: 3 (0.00%)
miglitol discrete No: 101728 (99.96%)
Steady: 31 (0.03%)
Down: 5 (0.00%)
Up: 2 (0.00%)
troglitazone discrete No: 101763 (100.00%)
tolazamide discrete No: 101727 (99.96%)
Steady: 38 (0.04%)
Up: 1 (0.00%)
examide discrete No: 101766 (100.00%)
citoglipton discrete No: 101766 (100.00%)
insulin discrete No: 47383 (46.56%)
Steady: 30849 (30.31%)
Down: 12218 (12.01%)
Up: 11316 (11.12%)
glyburide-metformin discrete No: 101060 (99.31%)
Steady: 692 (0.68%)
Up: 8 (0.01%)
Down: 6 (0.01%)
glipizide-metformin discrete No: 101753 (99.99%)
glimepiride-pioglitazone discrete No: 101765 (100.00%)
metformin-rosiglitazone discrete No: 101764 (100.00%)
metformin-pioglitazone discrete No: 101765 (100.00%)
change discrete No: 54755 (53.80%)
diabetesMed discrete Yes: 78363 (77.00%)
readmitted discrete NO: 54864 (53.91%)
>30: 35545 (34.93%)
<30: 11357 (11.16%)

Fidelity summary

umap model backend disc jsd mean disc jsd median cont ks mean cont w1 mean downstream sign match
metasyn metasyn 0.0793 0.0434 0.5417 1.2278
clg_mi2 pybnesian 0.0803 0.0533 0.315 1.5054
semi_mi5 pybnesian 0.0803 0.0533 0.315 1.5054
ctgan_fast synthcity 0.2747 0.1882 0.3647 6.7779
tvae_quick synthcity 0.1394 0.0791 0.1614 1.3723

Privacy summary

model backend n real n synth exact overlap rate near duplicate rate eps nn distance mean k min k pct lt5 k map rare qi reproduction rate identifiability score delta presence
metasyn metasyn 101766 1000 0.046 0.987 0.013 1 0.11 8 0.5641 0.005 1.5385
clg_mi2 pybnesian 101766 1000 0.042 0.981 0.0143 1 0.11 8 0.4744 0.004 1.9362
semi_mi5 pybnesian 101766 1000 0.042 0.981 0.0143 1 0.11 8 0.4744 0.004 1.9362
ctgan_fast synthcity 101766 256 0.0859 0.9102 0.0552 1 0.11 1 0.2564 0.0117 5
tvae_quick synthcity 101766 256 0.1016 0.957 0.0309 1 0.11 2 0.1026 0.0117 2.1429

Models

UMAPDetailsStructure

Real data

Model: metasyn (metasyn)

Per-variable fidelity
variable type KS W1 JSD
race discrete 0.0689
gender discrete 0.0655
age discrete 0.0818
weight discrete 0.1077
admission_type_id discrete 0.0957
discharge_disposition_id continuous 0.615 1.664
admission_source_id discrete 0.088
time_in_hospital discrete 0.0832
payer_code discrete 0.1178
medical_specialty discrete 0.2378
Downstream metrics
metric value
sign_match_rate
formula
Privacy metrics
metric value
n_real 101766
n_synth 1000
exact_overlap_rate 0.046
near_duplicate_rate_eps 0.987
nn_distance_mean 0.013
k_min 1
k_pct_lt5 0.11
k_map 8
rare_qi_reproduction_rate 0.5641
identifiability_score 0.005
delta_presence 1.5385
variable distribution
race core.multinoulli
gender core.multinoulli
age core.multinoulli
weight core.multinoulli
admission_type_id core.multinoulli
discharge_disposition_id core.truncated_normal
admission_source_id core.multinoulli
time_in_hospital core.multinoulli
payer_code core.multinoulli
medical_specialty core.multinoulli
num_lab_procedures core.truncated_normal
num_procedures core.multinoulli
num_medications core.truncated_normal
number_outpatient core.truncated_normal
number_emergency core.truncated_normal
number_inpatient core.truncated_normal
diag_1 core.multinoulli
diag_2 core.multinoulli
diag_3 core.multinoulli
number_diagnoses core.multinoulli
max_glu_serum core.multinoulli
A1Cresult core.multinoulli
metformin core.multinoulli
repaglinide core.multinoulli
nateglinide core.multinoulli
chlorpropamide core.multinoulli
glimepiride core.multinoulli
acetohexamide core.multinoulli
glipizide core.multinoulli
glyburide core.multinoulli
tolbutamide core.multinoulli
pioglitazone core.multinoulli
rosiglitazone core.multinoulli
acarbose core.multinoulli
miglitol core.multinoulli
troglitazone core.multinoulli
tolazamide core.multinoulli
examide core.multinoulli
citoglipton core.multinoulli
insulin core.multinoulli
glyburide-metformin core.multinoulli
glipizide-metformin core.multinoulli
glimepiride-pioglitazone core.multinoulli
metformin-rosiglitazone core.multinoulli
metformin-pioglitazone core.multinoulli
change core.multinoulli
diabetesMed core.multinoulli
readmitted core.multinoulli

Model: clg_mi2 (pybnesian)

Per-variable fidelity
variable type KS W1 JSD
race discrete 0.0628
gender discrete 0.0602
age discrete 0.0564
weight discrete 0.1179
admission_type_id discrete 0.0867
discharge_disposition_id continuous 0.329 2.8717
admission_source_id discrete 0.0885
time_in_hospital discrete 0.0898
payer_code discrete 0.1586
medical_specialty discrete 0.1933
Privacy metrics
metric value
n_real 101766
n_synth 1000
exact_overlap_rate 0.042
near_duplicate_rate_eps 0.981
nn_distance_mean 0.0143
k_min 1
k_pct_lt5 0.11
k_map 8
rare_qi_reproduction_rate 0.4744
identifiability_score 0.004
delta_presence 1.9362

Model: semi_mi5 (pybnesian)

Per-variable fidelity
variable type KS W1 JSD
race discrete 0.0628
gender discrete 0.0602
age discrete 0.0564
weight discrete 0.1179
admission_type_id discrete 0.0867
discharge_disposition_id continuous 0.329 2.8717
admission_source_id discrete 0.0885
time_in_hospital discrete 0.0898
payer_code discrete 0.1586
medical_specialty discrete 0.1933
Privacy metrics
metric value
n_real 101766
n_synth 1000
exact_overlap_rate 0.042
near_duplicate_rate_eps 0.981
nn_distance_mean 0.0143
k_min 1
k_pct_lt5 0.11
k_map 8
rare_qi_reproduction_rate 0.4744
identifiability_score 0.004
delta_presence 1.9362

Model: ctgan_fast (synthcity)

Per-variable fidelity
variable type KS W1 JSD
race discrete 0.234
gender discrete 0.0905
age discrete 0.329
weight discrete 0.295
admission_type_id discrete 0.3584
discharge_disposition_id continuous 0.245 1.8958
admission_source_id discrete 0.2299
time_in_hospital discrete 0.9173
payer_code discrete 0.8811
medical_specialty discrete 0.9886
Privacy metrics
metric value
n_real 101766
n_synth 256
exact_overlap_rate 0.0859
near_duplicate_rate_eps 0.9102
nn_distance_mean 0.0552
k_min 1
k_pct_lt5 0.11
k_map 1
rare_qi_reproduction_rate 0.2564
identifiability_score 0.0117
delta_presence 5

Model: tvae_quick (synthcity)

Per-variable fidelity
variable type KS W1 JSD
race discrete 0.1456
gender discrete 0.0817
age discrete 0.124
weight discrete 0.2066
admission_type_id discrete 0.1353
discharge_disposition_id continuous 0.5095 0.4571
admission_source_id discrete 0.1622
time_in_hospital discrete 0.3194
payer_code discrete 0.4272
medical_specialty discrete 0.5594
Privacy metrics
metric value
n_real 101766
n_synth 256
exact_overlap_rate 0.1016
near_duplicate_rate_eps 0.957
nn_distance_mean 0.0309
k_min 1
k_pct_lt5 0.11
k_map 2
rare_qi_reproduction_rate 0.1026
identifiability_score 0.0117
delta_presence 2.1429